Algorithmic Energy Saving for Parallel Cholesky, LU, and QR Factorizations
نویسندگان
چکیده
Slack is pervasive in runs of high performance applications, in the presence of various performance boosting solutions. The presence of slack provides ample opportunities for achieving energy efficiency for high performance computing nowadays. Regardless of communication slack, classic energy saving approaches for saving energy during the slack otherwise include race-to-halt and CP-aware slack reclamation, which reply on power scaling techniques to adjust processor power states judiciously during the slack. Existing efforts demonstrate CP-aware slack reclamation is superior to race-to-halt in energy saving capability. In this paper, we formally model our observation that the energy saving capability gap between the two approaches is significantly narrowed down on today’s processors, given the fact that state-of-the-art CMOS technologies allow insignificant variation of supply voltage as operating frequency of a processor scales. We also provide experimental evaluation for validation on a large-scale power-aware cluster.
منابع مشابه
LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs
We present performance results for dense linear algebra using the 8-series NVIDIA GPUs. Our matrix-matrix multiply routine (GEMM) runs 60% faster than the vendor implementation in CUBLAS 1.1 and approaches the peak of hardware capabilities. Our LU, QR and Cholesky factorizations achieve up to 80–90% of the peak GEMM rate. Our parallel LU running on two GPUs achieves up to ~300 Gflop/s. These re...
متن کاملFill-in reduction in sparse matrix factorizations using hypergraphs
We discuss partitioning methods using hypergraphs to produce fill-reducing orderings of sparse matrices for Cholesky, LU and QR factorizations. For the Cholesky factorization, we investigate a recent result on pattern-wise decomposition of sparse matrices, generalize the result, and develop algorithmic tools to obtain more effective ordering methods. The generalized results help us to develop f...
متن کاملPoLAPACK: parallel factorization routines with algorithmic blocking
LU, QR, and Cholesky factorizations are the most widely used methods for solving dense linear systems of equations, and have been extensively studied and implemented on vector and parallel computers. Most of these factorization routines are implemented with blockpartitioned algorithms in order to perform matrix-matrix operations, that is, to obtain the highest performance by maximizing reuse of...
متن کاملDesign and Performance Modeling of Parallel Block Matrix Factorizations for Distributed Memory Multicomputers
EEcient and scalable parallel block algorithms for the LU factorization with partial pivoting, the Cholesky, and QR factorizations in a distributed memory multicomputer environment are presented. The distributed system is viewed as a ring of processors and the algorithms correspond to shared memory algorithms parallelized on block level (explicit parallelism). Performance of the algorithms are ...
متن کاملParallel Block Matrix Factorizations for Distributed Memory Multicomputers
EEcient and scalable parallel block algorithms for the LU factor-ization with partial pivoting, the Cholesky, and QR factorizations in a distributed memory multicomputer environment are presented. The distributed system is viewed as a ring of processors and the algorithms correspond to shared memory algorithms parallelized on block level (explicit parallelism). Performance of the algorithms are...
متن کامل